Model Selection

Reinforcement Learning Fine-tuning

# Reinforcement Learning Fine-tuning

Finetuned Tamil Llama 7B Finetuned

A supervised fine-tuning (SFT) model based on the Transformers library, designed to optimize language model performance

Large Language Model

Qwen3 0.6B TLDR Lora

Qwen3-0.6B is an open-source language model based on the Transformer architecture, with a parameter scale of 600 million, suitable for natural language processing tasks such as text summarization.

Text Generation

Qwen 2.5 7B Base RAG RL

Qwen-2.5-7B-base-RAG-RL is a large language model with 7B parameters trained from scratch on an unknown dataset, incorporating Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL) technologies.

Large Language Model

Phi 4 Reasoning Plus

Phi-4-reasoning-plus is an advanced open-weight reasoning model developed by Microsoft Research, optimized through supervised fine-tuning and reinforcement learning based on Phi-4, focusing on advanced reasoning capabilities in mathematics, science, and coding fields.

Large Language Model

Transformers Supports Multiple Languages

Deepcoder 1.5B Preview AWQ

DeepCoder-1.5B-Preview is a large language model for code reasoning, fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B through distributed reinforcement learning, capable of handling longer context lengths.

Large Language Model

Transformers English

Deephermes ToolCalling Specialist Atropos

An experimental model fine-tuned by Nous Research using the Atropos reinforcement learning framework, focused on improving the tool calling performance of the Llama-3.1 8B model in inference mode

Large Language Model

Transformers English

Qwen2.5 0.5B Instruct Gensyn Swarm Fierce Placid Whale

A fine-tuned version based on Gensyn/Qwen2.5-0.5B-Instruct, trained using the TRL framework and GRPO algorithm

Large Language Model

Notbad V1 0 Mistral 24b

Notbad v1.0 Mistral 24B is a model focused on mathematical and Python programming reasoning, based on Mistral-Small-24B-Instruct-2501 and further trained with reinforcement learning.

Large Language Model

EXAONE 3.5 2.4B Fine Tuning

Hugging Face's Transformer model library supporting various natural language processing tasks

Large Language Model

Qwen2.5 0.5B Instruct

A 0.5B parameter instruction fine-tuned model designed for the Gensyn reinforcement learning group, supporting local fine-tuning training

Large Language Model

Transformers English

Alignprop Trl Aesthetics

A text-to-image generation model fine-tuned based on Stable Diffusion v1.5, using aesthetic reward functions on animal datasets and trained with reward backpropagation methods.

Image Generation

Vlrm Blip2 Opt 2.7b

A BLIP-2 OPT-2.7B model fine-tuned with reinforcement learning, capable of generating long and comprehensive image descriptions

Transformers English

Codellama 7b Hf ReFT GSM8k

Enhances the reasoning generalization capabilities of large language models through reinforcement fine-tuning, based on Codellama fine-tuning, suitable for code generation and comprehension tasks.

Large Language Model

Blip Image Captioning Large Mocha

This is the official fine-tuned version of the BLIP-Large model, optimized using the MOCHa reinforcement learning framework on the MS-COCO dataset to mitigate open-vocabulary description hallucination issues

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase